On randomness reduction in the Johnson-Lindenstrauss 

lemma 



Pawel Wolff *t 



Abstract 

A refinement of so-called fast Johnson-Lindenstrauss transform (Ailon and Chazelle [2\ , 
Matousek [17]) is proposed. While it preserves the time efficiency and simplicity of 
implementation of the original construction, it reduces randomness used to generate 
the random transformation. In the analysis of the construction two auxiliary results 
are established which might be of independent interest: a Bernstein- type inequality 
for a sum of a random sample from a family of independent random variables and a 
normal approximation result for such a sum. 
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1 Introduction 

The Johnson-Lindenstrauss lemma |15) is the following fact, which might appear quite 
surprising at the first sight: 

Theorem 1. Let e G (0, 1), be an N -point subset of £2 CLiT'd d > C , where C > is 
some universal constant. Then there exists a (linear) mapping / : ^2 ~^ ^2 such that 

y^^yex (l-e)||x-y||2< ||/(x)-/(2/)||2<(l + e)||x-y||2. (1) 

Despite the original motivation for the Johnson-Lindenstrauss, it quickly became clear 
that this fact is of great importance in applications, especially in designing of algorithms 
which process high dimensional data (see e.g. |13| [1] and references therein). For this 
reason, lots of application-oriented variants of the above result appeared quite recently, 
e.g. [1] 121 |T71 121 [ini Ell US]- In this paper we propose a refinement of the results of Ailon 
and Chazelle [2] and Matousek [T7|. In order to put our work into the context, we briefly 
sketch basic ideas behind, say, now classical proofs of the Johnson-Lindenstrauss lemma 
and comment on the papers [2] and jl?) in some more details. 
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Most of the known proofs of the Johnson-Lindenstrauss lemma provides the existence 
of the map / by drawing it according to some probabihty distribution on the space of dxn 
matrices and showing that it satisfies ([TJ with positive probabihty. The original proof of 
the Johnson-Lindenstrauss lemma [15j takes the map / to be an orthogonal projection onto 
a random d-dimensional subspace of ^2 random subspace means here a subspace drawn 
according to the normalized Haar measure on the Grassmannian Gd,n)- It turns out (by 
means of a concentration inequality or, as originally, isoperimetry) that whenever d > ^ 
for a given constant K > 0, probability that / maps any fixed vector v £ £2 onto a vector 
of the length (1 it £)\J djn ||u||2 is at least 1 — C exp{—cK), where c, C > are universal 
constants. Taking K of order log ensures that the failure probability is less than N~'^ 
thus taking the union bound over (^) vectors v = x — y {x,y £ X) the probability that the 
map \/n/df fails ([T| is less than 1/2. Instead using random orthogonal projections one can 
use a random matrix which entries are i.i.d. Gaussian random variables [9j or the properly 
normalized matrix of independent random signs [IJ. In each of these cases, however, the 
time complexity of evaluating /(x) for a single point x S is 0{nd) = 0{n\ogN/e^) 
which sometimes is too much for practical applications. Also, the amount of randomness 
(measured in number of random bits, i.e. unbiased coin tosses) required to generate / is 
0{nd). 

Ailon and Chazelle [2] proposed a construction of a random map / which they called 
a fast Johnson-Lindenstrauss transform for the reason that in a wide range of parameters 
(basically A^ vs. n) it is computationally more efficient than the constructions described 
above. Assuming n is a power of 2, they take / = PHD where D is an n x n diagonal matrix 
of random signs, H is the matrix of the Walsh-Hadamard transform on ^2 normalized by 
the factor l/\/n (so that H is an orthogonal matrix with all entries being ztl/^/n), and P 
is some sparse random d x n matrix. The transformation HD is an isometry on £2 and it 
can be shown that with probability close to 1 it maps any fixed unit vector u £ £2 onto 
a vector v G £2 with small £00- norm. More precisely, for any constant C > 0, if ||u||2 = 1 
and V = HDu, then with probability at least 1 — , 



where C = C'{C) > is a constant depending on C only. This property is essential 
for the construction of the matrix P which is as follows: fix g S (0, 1] and set P to be 
a matrix of independent random entries, each entry equals with probability 1 — g and 
with probability q is A/'(0, l/((ig)) random variable. It turns out that for any fixed unit 
vector V £ £2 satisfying \\v\\^ < Cy^/ Y^log(A^/e), the probability that 1 — e < ||-Pt'||2 ^ 
1 + e is at least 1 — N~'^^^\ Together with (|2]) it implies that the map / will work 
whenever q > C{logN)log{N/e)/n. This means the expected time of applying P to a 
single vector is 0{dqn) = 0{log^ N/e"^) (here we assume log(l/e) = O(logA^)). Since the 
transformation HD can be applied in time 0(n log n) using the Fast Fourier Transform 
over the group (^2)"", this construction beats the previous approaches in terms of time 
complexity whenever log A^ = o(n^/^) and log A^ = a;(logn). 

Since the usage of Gaussian random variables in the matrix P generally causes some 
extra technical problems in a practical implementation, Matousek |17) refined the result of 
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Ailon and Cliazelle replacing Gaussian r.v.'s with random signs (Bernoulli ±1 r.v.'s). Also, 
in both papers, a similar property for the map / as a map from £3 ™to £f was proved. 

Generating the matrix P described above requires roughly ndlog2{i/q) random bits. 
This can be significantly reduced if we slightly change the distribution from which P is 
sampled. Instead of fully independent entries, let P now have only independent rows, and 
within each row we choose k = nq entries at random (without replacement) in which we 
put a random sign. The remaining entries are zeros. This can be done using 0(A;logn) = 
©(log^iVlog n) random bits per row (see Section [3] for details). Additionally, in the matrix 
D it is enough to have only 0(Iog A^)-independent Bernoulli ±1 random variables which 
can be modeled on the probability space {0, l}0(l°g^l°g") (see e.g. jl Proposition 6.5] 
or |18| Chapter 7.6, Theorem 8]). Therefore, in total we use 0(log^ A^logn/e^) random 
bits and keep the computational efficiency and easiness of practical implementation of the 
constructions from [2j and |17j. 

The probabilistic analysis, similarly to the one done by Matousek in [T7], relies on tail 
estimates for sums of random variables. Here, however, these random variables are not fully 
independent. The main tools we established to perform the analysis is a Bernstein- type 
inequality and the Berry- Esseen bound for a sum of a random sample from a family 
of independent random variables. Although these results are not entirely new (see the 
comments following Theorem [7] and Theorem [T3]) , we believe they still might be of some 
interest. Also, having potential applications of the result in mind, we provide explicit and 
reasonable numerical constants in estimates of parameters of our construction. 

To finish the introduction, let us note that recently couple of other results in this area 
appeared, see |16] and references therein. Although these results beat our in terms of 
amount of randomness, the methods used there are (at least in part) quite different from 
ours and do not seem to work in the case of embedding into ii. 

2 Notation 

Throughout the rest of the paper we use the following notation. Let e G (0, 1), 6 G (0, ^) 
and a positive integer n be fixed parameters. Our goal is to construct a random linear map 
/ which acts from ^2 to a space (£2 or ii) of a smaller dimension and satisfies the following 
property: for any fixed u £ £2, 

P((l - e) \\u\\2 < \\f{u)\\ <(! + £) \\u\\2) >l-25. 

Assume n is a power of 2 (if necessary we augment a vector u with zeros). Let d and 
A; < n be positive integers to be specified later. We shall consider the following families of 
random variables: 

• Pit(^2, ■ ■ ■ , Pn are symmetric ±1 random variables and /-independent with I := 2[log(n/5)] , 
i.e. any I of these random variables are independent. If / > n then . . . are 
just independent. 

• ei,£2, ■ ■ ■ ,£n are independent symmetric ±1 Bernoulli random variables. For i = 
1, . . . ,d, {si,i,Si,2, ■ ■ ■ ,£i,n) are independent copies of {£1,62, ■ ■ ■ ,£«)• 
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• ^1)^2; • • • ^Cn are 0-1 random variables such that the distribution of a random sets 
{j: = 1} C {1, . . . , n} is uniform over all subsets of {1, . . . , n} with cardinality k. 
In the other words, for any J C {1, . . . , n} with cardinality k, P({j : (,j = 1} = J) = 
. For i = 1, . . . , d, Ci,2, • • • , Ci,n) are independent copies of (^i, ^2, • • • , Cn)- 

Moreover, all the families of random variables are independent, that is (t(/3i, . . . , /?„), 

cr(ei, . . . ,£„), cr{ei^i, . . . ,ei,n), • • • ,cr{£d,i, ■ ■ ■ ,£d,n), ■ ■ ■ ,^n), '7(6, 1> • • • >Cl,n), • • •, o-{S,d,i, ■ ■ ■ ,£,d,n) 

are independent. 

For (7 G {1, 2} we define a random linear map fqi £2 ^ £q as follows: 



1 



(3) 



PHD 



where 



D = 




\^d,l£d,l ••• id,n£d,nj 

and H = is the normalized Walsh-Hadamard matrix of size n x 7i, that is the orthogonal 
matrix defined by the following recursive formula: 





Hi = (1). 

The function log stands for the natural logarithm. We write ||-||^ for the £q norm 
{l<q< 00). 



3 The results 



The main result of this paper states that the random transformation fq-. £2 ^ £q {q = 1,2) 
defined above with probability close to 1 almost preserves the norm of any fixed vector, 
provided certain bounds on the parameters d and k hold. 

Theorem 2 (the £2 case). Assume n is a power of 2, 




Then the random linear transformation f2 as defined in ^ satisfies 



V„e,n P((l + e)-i||n||2 < ||/2(n)||2 < (1 + e) Hulls) > 1 - 25 



provided k < n. 
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Theorem 3. Assume n is a power of 2. For any constant k G (0, 1), assume 



^ ^ TT + vV2 3^ log(2/(5) and fc > max ( ^ , 20e ) log(2n/(5). 

T/ien i/ie random linear transformation fi as defined in ^ satisfies 



V„e,n P((l-£)||n||2 < v^ll/i(«)lli < (l + e)h||2) > 1 " 25 
provided k < n. 

The typical situation in which these results are applied is the one mentioned in the 
introduction: we have points in £2 {n is a power of 2) and we want to embed them 
into a space of (possibly much lower) dimension d with distortion 1 + e. To this end we 
choose an embedding at random, as specified in Theorem |2] or |3l Assuming we want the 
embedding to work with probability at least 1 — p, where p E (0, 1) is a fixed parameter, 
we take 5 = p/N'^ and apply the union bound over ('^) vectors being differences of pairs 
of points to get that the embedding fails to have distortion 1 + e with probability at most 
{^,)26<p. 

In the case of embedding into £2, 



d 



(l + 2e)^ /3iV2\n r / /edA^^x x ^2nA^2 
^ ^ 1"-' ' max 7.25 log , 55 log ' 



1.55 5 log 



k 



p J J \ P 



satisfy the hypothesis of Theorem |2] unless k > n. If indeed k > n, 01 even k > n/3, then 
one can use the construction of Achlioptas [1] which provides a random embedding into £2 
with d similar to ours, roughly with constant 1 instead of 1.55. The embedding is given by 
a d X n matrix with entries being independent random variables assuming values 1,0,-1 
with respective probabilities g, |, g. See [1] for details. 

Therefore, in what follows, we assume k < n/3. The construction of the random 
embedding /2 is actually the matter of constructing the random variables /?!,...,/?„ and 
6,1^1,1, • • • ,6,n^i,n, ■ ■ ■ , S.d,i£d,i, ■ ■ ■ ,£,d,n£d,n oii a Sample space {0, 1}'' with the uniform 
probability, where r will be the number of random bits used. Due to the construction 
of Alon, Babai and Itai |H Proposition 6.5], the /-independent (now / = 2[log(A^^n/p)]) 
symmetric ±1 random variables can be constructed on the uniform sample 

space {0, 1}(^°S2 n+i)l/2+i^ thus O ((log n) log(A^n)) random bits suffice. (Moreover, given an 
element of the sample space, their construction allows to compute the sequence /3i, . . . , /3n 
in time 0(/n log n) = O {n[log{Nn)) logn).) For a given i G {1,. . . ,d}, • • • ,S.i,n£i,n 

can be constructed as follows: 

while ffJ<kdo 

using log2 n random bits sample an index j uniformly in {1, . . . , n} 

if j ^ J then 
J^JU{j} 

end if 
end while 
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Thus J becomes a random subset of {1, . . . , n} of size k. Now set = 1 for j £ J and 
S^ij = for j ^ J and using k random bits sample Sij for each j £ J. The total number of 
random bits used is k + Ti log2 n, where Tj is the number of iterations made by the while 
loop. Note that Tj is a sum of k independent geometric random variables with subsequent 
success probabilities 1, . . . , ^^^^ (the success is sampling j not yet contained in J). 
Since k < n/3, a rough estimate gives ETj < ^k and Var(rj) < jk. Proceeding in the 
same way for each row, we construct the matrix P by using dk + riog2n random bits, 
where T = Ti + . . . + and Tj are independent. Thus ET < ^kd and Var(T) < ^kd. By 
the Chebyshev inequality, for A > 0, 



T > 



^kd + A-^l^^^ <r(T>ET + AVVar(r)) < A" 



hence we see that with probability close to 1, T does not exceed some constant times 
kd. (Actually one can derive much stronger exponential tail estimate for T, but it is not 
essential here.) Overall, the whole construction uses 0((logn) log(A^n) + dklognj random 
bits to generate the random embedding /2. Assuming p is fixed and logn = O(logA^) 
and log(l/e) = 0{logN), we have d = 0(e~^ log A'"), k = O ((logA^)^) and the number of 
random bits used is O (e~^(log A)^ logn). The time complexity of applying /2 to a single 
point is 0{dk + nlogn) = O (e~'^(log A)'^ + nlogn), using the Fast Fourier Transform. 

The case of embedding into ii is similar, since the random transformation fi used has 
the same structure as f2- Let us note, however, that this time the requirement A; < n in 
Theorem [3] is in some sense restrictive and cannot be circumvented as previously. For any 
KG (0,1), 



d- 



3.15 + 3.4Kg 



log 



2A2 
P 



max 



19.3 



(1 



55 log 



2nN' 
P 



satisfy the hypothesis of Theorem [3] as long as k < n. If A; < n/3 (or some other fraction 
of n), we may proceed with the same algorithm of sampling the matrix P. In such case, 
the total number of random bits used to generate /i is 0( log n(log(An)) +dA;logn) which 
is O (e~^(log A)^ log n) , again assuming logn = 0(log A), log(l/e) = O(logA). If k is 
between n/3 and n, we may still get an embedding into £i. Since the sparsity of the matrix 
P is rather poor, one can set A; to be n while possibly increasing the parameter k G (0, 1) 
in order to slightly reduce the target dimension d. Of course, having k = n there is no 
need to sample ^jj- as all are equal 1. 

Finally, if A; > n for all k, G (0,1), which happens when log A = J7(e^n), then even 
for k = n Theorem |3] cannot guarantee the random map /i is an embedding of a set of 
A points in £2 iiito if with distortion 1 + e. The reason for that is the distortion of /i 
depends on the error of approximation 



E 



E 



E|G| = V2Ar, 



where X]j=i ~ ^ G is a standard Gaussian random variable (see the next section 
for details). If one knows that all \vj\ are small, then the distribution of Yl^=i ^j^j close 
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to the standard Gaussian. (A suitable version of that statement, working also in the case 
k < n and thus dealing with the sums YlJ=i '^j^j^j^ formulated as Theorem 1131) On the 
other hand, if, say, one of tij's is close to ±1, then the error of the above approximation is 
of constant order rather than of order of e. Since vi, . . . ,Vn are in fact the coordinates of a 
vector HDu, where u is one out of N arbitrary unit vectors in the bound on maxj \vj\ 
that can be guaranteed (with non-negligible probability and randomness involved is due to 

the matrix D) is of order min | y^ ^"^^ -^, l| (see Proposition [5] for the precise statement). 
This bound becomes trivial if logA^ = 0,{n), but one cannot expect anything essentially 
better. Indeed, take a set of = 2" unit vectors {-^l/^/n, . . . ,zizl/^/n) in £2- Then 
whatever instance of the matrix D we consider, these is always a vector u £ X such that 
HDu =(1,0,..., 0). In the other words, P {3u(.x \\HDu\\^ = 1) = 1. 

4 Proofs 

The proof of Theorems [5] and |3] consists of four steps, which we outline below: 

1. We show that for any unit vector u € £2, the random vector V = HDu has typically 
the 

£00 norm less than Cy^log(?T./ 5) / \fn. 

2. If a unit vector v G £2 has small £00 norm, then each coordinate of the random vector 
W = {Wi, . . . , Wd) = Pv, which is distributed as the sum ^n/k Yl^=i ^j^j'^jj ^^^^ 
concentrated. More precisely, in the case of embedding into £2 space we shall show 
that is well concentrated around its mean KW^. In the case of £1 embedding, 
we show the concentration of \Wi\ around E|Tyj|. In both cases we use a version 
of Bernstein inequality for a sum of a random sample from a family of independent 
random variables (Theorem [7|. 

3. In the case of £2 embedding we note that KW^ depends only on the length of the 
vector V, and if ||v||2 = 1, then KW^ = 1. Since it is no longer true for ElVFjl, in the 
£1 case we shall use a normal approximation of the distribution of Wi (Theorem [T3]) 
in order to show that E|Wj| is close to y^2/7r. 

4. If the random vector W has all its coordinates well concentrated around a certain 

111 1 1 2 111 

value then ^ ||VF||2 or ^ ||W^||i is well concentrated. 
In the subsequent sections we elaborate on each of these steps in detail. 

4.1 Random signs and the Walsh-Hadamard transform 

Assume u £ £2 is a unit vector and let V = {Vi, . . . , Vn) = HDu. Since H : £2 ^ £2 is an 
isometry, ||1^||2 = 1 a.s. Also, H has all entries ibl/-y/n thus each coordinate Vi has the 
distribution of a random variable X^j=i f^j-^j^ where xj = ±Uj (the particular choice of 
the signs depends on i), in particular, X]j=i ^| — ^■ 
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Lemma 4. // Yl^=i ■^'j ~ ^ '^^^ ~ ^i=i f^J'^i' 



en 



\S\ > v^2elog(2n/5) < 



Proof. Recall that f3\,...,(3n are /-independent random variables with / = 2[log(n/(^)] . 
Therefore, for (fully) independent Bernoulli sequence = ±1, we have 

j=i 

(just expand both sides, use linearity of expectation and note that each term involves 
the expectation of the product of at most min(/,n) distinct /3j's or ej's). The classical 
Khintchine inequality states 



/ I p\ i/p / \ 1/2 



for any p > 2 with some constant Cp depending on p only. It follows e.g. from classical 
hypercontractive estimates for Bernoulli random variables (see [6]) that the inequality holds 
with Cp = \Jp — 1. Taking p := I = 2 [log (1/(5) + logn], we thus have 

{ESPy^" < < ^21og(n/5) + l < v'21og(2n/(5) 

which combined with the Markov inequality 

IP {\S\ > ^/^(E|5|P)^/^) < < s/n 

finishes the proof. □ 

Taking the union bound over all coordinates of V we immediately arrive with the 
following 

Proposition 5. Let u G G be a unit vector and let V = HDu. Then 



^ ^2elog{2n/S) \ ^ 



4.2 Bernstein inequality for a random sample from independent r.v.'s 

First, let us recall the classical Bernstein inequality. Let Yi,Y2, . . . ,Yn be independent 
random variables with El^ = 0. We assume all moments of l^'s are finite and for some 
constant M > 0, 

^Yi\P < ^afMP-'^, for any integer p > 2. (4) 

The classical inequality of Bernstein provides the estimate for the tail of the sum Yi + . . . + 
Y„:. 
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Theorem 6. Let S = Yi + . . . + Yn with Yi 's satisfying ^ and set cr^ = X^ILi '^1- ^^^^^ 
for all s > 0, 

P(5 > s) < exp ( ^ ,^ ^ and ¥(S < -s) < exp ( ^ ^ ^ ^ . 

Beside the classical Bernstein inequality, we shall also use its variant for a sum of a 
random sample of k out of n random variables Yi, . . . , 1^, that is for the sum 'Y^^=i ^i^i- 

Theorem 7. Let S = ^iYi with Yi 's satisfying ^ and set = Yl^=i ■ Then for 

all s > 0, 



F(S' > s) < exp 7— and ¥{S < -s) < exp 



2^0-2 + 2Ms / K - ; - f y 2 + 2Ms 

(Recall, P(^i = 1) = k/n.) 

We will need a simple 
Lemma 8. For any A C {1, . . . , n}, 

Proof. If #A > k then E HieA = 0, otherwise 

/n-#A\ 

^X{ii= n^^ = l for each i^A) = 

_k{k-l)...{k-i^A + l) ^ fk\*^ 



n(n - 1) . . . (n - #^4 + 1) \ny 



□ 



Proof of Theorem\^ Except for using Lemma |8l the proof follows a standard proof of 
Bernstein inequality. We present the proof below for the sake of completeness. 
First, for any i and |t| < 1/M, 

oo 2 °° 2j.2 

^ k\ » - ^ 2 ^ ' ' - 2 1 - M\t\) ^ ' 

k=0 k=2 ^ ' 

To estimate Ee*"^, we condition on = a{£,i : i = 1, . . . , n), use (j5|) and Lemma [S] 



Ee*^ = E JjE[e*^'^'|j"] = E - +^iEe*^') 

i=l i=l 



M|t| 



< exp — ■; 



n2(l - M\t\ 
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One obtains the inequality for the tail probability P(5 > s) by taking t = ,. / and 
using Chebyshev's inequality. For the lower tail use ¥{S < —s) = ¥{—S > s). □ 

Remark 9. Since the random variables Ci; • • • )Cn ^^'^ negatively associated (see [I^), the 
above result can be deduced (up to numerical constants) from a quite general comparison 
result of Shao |19) . See also the paper of Hoeffding |12j for related results. 

We use Theorem [7] to obtain concentration for coordinates of the vector W = Pv, i.e. 



= JtYI ^oiKj^i,^ for i = 1, . . . , d. 



Proposition 10. Assume v G £2, \\v\\2 = 1, \\v\\^ < a and let W = Pv. Then for 
i = 1, . . . ,d and any s > 0, 



H\Wi\ >s)< 2exp 



2 + Un/kYl'^as / ■ 



Proof. Fix any i G {I,. . . ,d} and set Yj = {n/k)^/'^eijVj. With (t| = {n/k)vj and M = 

(n/fe)"^/^a/3, the condition (j4| is satisfied. Since Wi = Yl^=i^i,j^j '^^ ~ Y2^=i^j ~ 
n/k, Theorem [7] provides the desired bound on P(|VFj| > s). □ 

For the sake of providing good numerical constants, beside the tail estimates established 
above we estimate a few first even moments of Wi under the additional assumption 

n 9 1 

k" ^ '■" ■= To' 

Lemma 11. Under the assumptions of Proposition [70|. if ^ holds then 
EW^^ < 3.1, EW,^ < 17, EW^ < 127, EVF/° < 1283. 
Proof. For q = 2,3, 4, 5, write 



EW^'' 



and expand the right hand side. By the symmetry and independence of ej's, all the terms 
in the expansion involving odd powers vanish. Using the fact that for any integer qi > 1, 
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and Lemma [5] we thus obtain 



< ©^^(^/n) + 3 (^) ' (fc/n)^ = + 3 < ro + 3. 

Similarly, 



(distinct) 

< {n/kfa'^ + 15{n/k)a'^ + 15 < + 15ro + 15, 



and 



I (422)/;/ \i 4 2 2 I (2 2 2 2) / ; / \4 22221 

jl,j2,j3 ilj'2j3,j4 / 

(distinct) (distinct) 

< {n/kfa^ + 28(n/A:)2a4 + 35{n/kfa'^ + 210(n//c)a2 + 105 
<r^ + 28rl + 35rg + 210ro + 105, 

and 

FW^O < _L -'^^ Vs , , (e 2°2) 2 , (iJjl 2 , (4222) , (2 2 2°2 2) 

<^o+^8 2/o+ V64J °+ 2! "^0+ 2! + 3! + 5! 
= + 45ri] + 210ri] + 630rg + 1575rg + 3150ro + 945. 

□ 

We shall use the following simple lemma to handle the deviation of or \Wi\ from 
their means. 

Lemma 12. Assume y > a.s., a > and R+ — t- M+ is non- decreasing. Then 

E$(|y - a\) < E$(y) + $(a). 

Proo/. Note, that <^{\Y - a|)l{y>„} < <^{Y) a.s. and <^{\Y - a\)l{Y<a} < ^(a) a.s 
Summing up both inequalities and taking the expectation concludes the proof. □ 
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4.3 E ll-Pf II2 = d and E ||-Pf ~ d^lj-n by normal approximation 

Let u E ^2 t)e a unit vector and W = Pv. Note that . . . , are independent and 

/ n X 2 

= { E ) = ^ E ^/^Eo = 1, 

hence EIIVFII2 = d. 

The case of £i-norm is more dehcate. In principle, E|Wj| depends on f = (vi, . . . ,Vn)- 
However under the assumptions of small ^oo-norm of v and k large, the distribution of Wi 
is approximately Gaussian and thus E|VFj| can be approximated by y^l/ir. To this end we 
establish a slightly more general result which can be regarded as Berry-Esseen bound 
for a random sample from a family of independent random variables. 

Theorem 13. Let n > 2, Yi, i = l,2,...,n he independent random variables and in- 
dependent of {^i, . . . ,^n), satisfying El^ = 0, Xli^i ^■^'^ ~ ^/^ ^'^'^ having finite third 
moment. Denote Xi = ^jl^ and S = X^ILi"'^*- ^^^'^ Wasserstein distance between the 
distribution of S and the standard normal distribution 

n 

dw{S,G):= sup |E/i(5) -E/i(G)| < 3^E|X^|^ 
/leLip(i) -^^ 

where G ~ A^(0, 1) and Lip(l) is a set of 1-Lipschitz functions on M. Moreover, 

\E\s\ - 727^1 <^j2^\x,f = ^^E^i^^i'- (7) 

i=l i=l 

In the literature there exist many related results, most of them concerning more general 
problem called combinatorial central limit theorem. However, the author was not able to 
find a result which implies ([7| with a reasonable numerical constant. The combinatorial 
central limit theorem roughly states that Sn = Y17=i ^,7r(j) where (Xjj)jj<„ is a matrix 
of independent random variables having finite third moments and tt is a random permu- 
tation of the set {1, 2, . . . , n}, independent from Xij^s, after proper normalization has the 
distribution close to standard normal. Taking the matrix (Xij), whose first k rows are 
independent copies of the random vector (Yi, . . . , 1^) and the remaining entries are zeros, 
boils down to the problem from Theorem [T3l 

For example, the result of Ho and Chen |11| Theorem 3.1] on the combinatorial CLT 
implies an estimate similar to ([7| but asymptotically weaker. Bolthausen [S] proved an 
optimal error bound but only in the case of deterministic (Xjj)'s. Recently, Chen and 
Fang |8] proved the general version of combinatorial CLT with the optimal rate of nor- 
mal approximation error. They bound the Kolmogorov distance, which is generally more 
difficult to handle in comparison to the Wasserstein distance. However, for our purposes 
the Wasserstein distance is better suited and moreover it is possible to obtain an estimate 
with a reasonable numerical constant. 

As in the results on combinatorial CLT mentioned above, we employ Stein's method. 
Except for a few twists, we basically follow the reasoning presented in jJl Section 2] which 
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illustrates the usage of Stein's method in the most basic setting of sums of independent 
random variables. 

Proof. It is enough to consider a 1-Lipschitz /i: M — )• M which is piecewise continuously 
differentiable. As in jTl Section 2.1], consider the differential equation 

f'{x) - xf{x) = h{x) - m{G) (8) 
whose solution is given by the formula 

/(x) = e^''/2 r {h{t) - E/i(G))e-*'/2 (9) 



Note that / is and /' is piecewise continuously differentiable, so for any a, 6 G M, 
|/'(a) - f'{b)\ < |a - 6| \\f"\\^. It turns out [H Lemma 2.3] that ||/||^ < 2 \\h'\\^ < 2, 
||f|lo,<4||/i'|L<4and H/" ||^ < 2 < 2. 

Putting 5" as X in ([8]) and taking the expectation we get 

m{s)-m{G) = ^ns)-sf{s)). 

Set =S-Xi and define 

Ki{t) = EXj (l{o<t<Xi} - l{Xi<i<o}) ■ 
Note that Ki{t) > for all t £ M, 

Ki{t)dt = EXf and / \t\Ki{t) dt = -E\Xif . (10) 
-00 J— 00 ^ 

Since Yi is mean-zero and independent of a{S^^\^i), we haveEXi/(SW) = El^E^i/(SW) = 
0. Therefore 

n n 

ESfiS) = ^EXJiS) = J^EX, - 

j=l 4 = 1 

= Y,EXJ +t)dt = Y, ^f\S^'^ + t)X, {l{o<t<x,} - l{x.<t<o}) 

j=i •^o i=l 

= E / = + t)Y, (l{o<t<y,} - l{y.<t<o}) % = l] 

= / -IE[/'(S« = l]Ey, (l{o<t<yj - l{y.<t<o}) 

= / E[f'{s^^ +t)% = i]m)dt. 

A—^ J — 00 
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Since Yl,'i=i ^-^f = Ij we have 

Ef\S) = J2 Ef'{S)Ki{t)dt. 

i=l -^-"o 

Combining two preceding identities we get 

E(/'(5) - sfis)) = X] r (Ef'is) - E[/'(5« = 1]) m)dt 

i=l 



POO / 

= E / r^^' = Wis) - +t)|c, = 1] 
i=i ^ 

+ P(^, = 0) {^[f{S)\^i = 0] -E[/'(SW = 1] ) ]Ki{t)dt 



Next, estimate |I| and |II|: 



-l+(l--]ll]Ki{t)dt. 
n \ nil 



< E 



and 



<H\njXi-mi = A<\\f''\Lim\ + \t\), 



|II| = E[/'(5«)|e, = 0] - E[/'(S« + t)\U = 1] 



= 0] - E[/'(5«)|ei = 1] I + iiriL 1^1 

|ni-n2| + ||r|L|t|. 



(11) 



< 



Let us define a random variable J which in independent of (Yi,...,l^) and given the 
vector (,^1, . . . ,S,n), J is uniformly distributed on {j : = 1}. Note that jC{S^^^ — Xj\S^i = 
0) = £(S'(*^|^j = 1) (both refer to the distribution of the sum of a random sample of /c — 1 
random variables out of ^i, ... , y^.i, Fj+i, . . . , y„), thus E[/' {S^^ - Xj) = O] = II2. 
Hence 

- II2I < E [ / - Xj) + Xj) - f{S^'^ - Xj) = 0' 



<ll/"IL^[i^^ik^ = o]- 

Since C{j\^i = 0) is uniform on {l,...,n} \ {i} and = 1 a.s., E[|Xj||^j = O] 

;r^E,-^iE|y,|thus 



ini < iin 
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Plugging the bound on I and II into (|lip and using (ITUD and the identity E|Xj|P = ^E|yi|P 
(for p = 1, 2, 3), we obtain 

E{f'iS)-Sf{S)) 

■ i=l ^ ^ i=l 

n 

— j;^(E|y,f)V3(E|y,|3)2/3 



= 11/1 
< liri 



n \ n / n 

1=1 j^i 



where the last inequality follows from the Holder inequality. Now, by the standard rear- 
rangement inequality we note that for any integer s, 



5^(E|y,+,|3)V3(]E|y^|3)2/3 < ^(E|y,|3)i/3(E|y,|3)2/3 = ^E|y,p 



1=1 i=l i=l 

(the index i + s is taken modulo n and we set Yq = Yn). Hence, 

n n—1 n n 

^5](E|y,f)i/3(E|y,|3)2/3 = ^^(E|y,^^|3)i/3(E|y,|3)2/3 < (n-i)^E|y,|3. 

1=1 j^i s=l 1=1 i=l 

Finally, 

E(m-5/(5)) <^liriLj;E|x,|3. 

i 

Together with the bound < 2 it yields the estimate for dw{S,G). To bound 

[Els'! — Y^2/7r|, we take an explicit solution to the equation (|8]) for h{x) = \x\: 

1 - 2e^'/2$(x) for x < 0, 

2e^V2(i _ 1 forx>0. 



where = e ^^/■^ dt is the normal distribution function. 

We shall prove that = 1. Note that f{x) is an odd function, thus we compute 

f"{x) for X > only: 



fix) = 2e^ - $(x))(l + x^) - x^/2/ 



IT. 
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Since lima;^o+ f"{x) = 1, it suffices to sliow f"{x) > and f"'{x) < for all x > 0. To 
this end, we use the estimates for the Gaussian tail proved in j20) : for all x > — 1, 



VV 



vr- 



We use the lower bound from (|12p to prove /" > 0: 



3x + Vx^ + 8 ■ 



2(1 + x2) 



+ + 4 



X 



+ 2 - x\/x2 + 4 



+ \/x2 + 4 



(12) 



and note that (x^ + 2)^ = x*^ + 4x^ + 4 > (^x\/x2~+4j = x^ + 4x^. 

To prove /'"(x) = 2x(3 + x'^)e''^/'^{l - ^>(x)) - (x^ + 2)y^ < we use the upper 



bound from (1121): 



2x(3 + x')e^y\l - $(x)) < 4x(3 + x^ 

3x + va;"' + 



Now it suffices to prove 4x(3 + x^) < (x^ + 2)(3x + \/x^ + 8), or equivalently 

„3 



X" + 6x < (x^ + 2)Vx2 + 8, 



which is obvious by calculating LHS^ — RHS'^ = —32 < 0. 



□ 



Specializing Theorem 1131 to the random variables Yj = (n/k) ' EijVj we arrive with 
Proposition 14. Assume v G Il'^ll2 ~ ll^lloo — ^ ~ Then for all 

3 



Proof. By ©, 



<__^(^_j \v,f<-a^. 

3 = 1 



□ 



4.4 Concentration of ||-Pt'||^ and the proof of the main result 

Let V ^ i^he a unit vector and W = Pv. In the two next subsections we shall provide the 

III 1 1 2 I 

deviation bounds for ||VF||2 — 1 and 



results to prove Theorems [2] and |3j 



VF||-|^ — -y/2/7r as well as combine all the auxiliary 
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4.4.1 The £2 case (q = 2) 

Proposition 15. Assume v G 11^112 ~ ll^lloo — ^ '^^^ ^ ~ & holds, 

then for any t > 0, 

+ . . . + > d + < exp f-^;^^) + exp (-^ + log(2d) 



and 

P(VFf + . . . + TyJ < d - t) < exp 



'6d 



Proo/. Denote Z = + . . . + W|. Recall that I^i, . . . , are independent, ¥Wf = 1 
thus EZ = d. 

First we estimate the upper tail. Let sq = 2 a — ■ Proposition 1101 gives 

F(|Wi| > s) < 2exp(-sV3) for < s < sq- (13) 

Set Xi = Wj^l|H4/i|<so} ^"^^ ^ ~ X]f=i -^i- Clearly EZ < EZ = d, hence the union bound 
and (|13p imply 

P(Z -d>t)< ¥{Z -d>t) + F{3i\Wi\> sq) 

(14) 

< P(Z -EZ >t) + 2(iexp (-sg/S) . 

Next, we estimate ¥{Z — EZ > t) using the classical Bernstein inequality for the sum of 
the variables Xi — EXj. To this end, we need to verify the condition Q. Note that 

E\X, - EX,p = Var(X,) < EXf < EW,^ < 3.1 (15) 

where the last inequality follows from Lemma [TTI To bound higher moments of \Xi —EXi\ 
we use Lemma [12] and the fact that EXi < EW^ = 1 which imply 

E\Xi-EXi\P <EXf + 1 foranyp>0. (16) 

By (II3|, E{Xi >t)< 2exp(-t/3) for aU t > 0, hence for any p > 0, 

EXf = / ptP~^F {Xi >t) dt<2 / ptP-^ exp (-t/3) dt = 2 ■ 2,PT{p + 1). (17) 

However, for p = 3, 4, 5 we use Lemma ITT] to get 

EXf < 17, EXf < 127, EXf < 1283. (18) 

Using (|15p for p = 2, combining (|16p with (|18p for p = 3,4, 5, and combining (|16p with 
(jl7p for p > 6, it is a matter of elementary calculations to verify that 

E\Xi - EXi\P < ^a'^MP-'^ for any integer p>2 
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with cr? = 3.1 and M = 6. Now the classical Bernstein inequality from Theorem [H] implies 

P(Z -EZ >t)< exp 



t2 



6.2d + 12t 

which combined with ()14p yields the estimate for the upper tail. 
For the lower tail we use Lemma [16] (see below): 

F{Z-d< -t) = P (1 - W,^) > ?j 



e \ ft' 



since E (l - Wf) ^ = EH^"^ - ^KWf + 1 = EVF"/ - 1 < 2.1 by Lemma [n] □ 

Lemma 16. Lef Yi, . . . ,Yn are independent random variables, EYi = 0, El^^ < o"^ and 
< 1 a.s. r/ien for any t > 0, 



»(yi + . . . + > < exp 



2 (e'^"' - l) a^n / ' 

Proof. Clearly, we may assume t < n. Let o > to be specified later. Using the inequality 

e"" <l + x+ ^—^x^n, 
a 

which is valid for all x < a, we bound the Laplace transform of Yf for any A < a, 

EexpfAFi) < E ( 1 + AFi + ^-^X\^/2] < 1 + 

\ a J a 

Therefore, for any A < a, 

F{Yi + ... + Yn>t) <exp ( -At + ^—-^X^na"^/! ] . (19) 



Taking A = -^n-j — ^ and a = a we clearly have A < -s^cr = < a thus (1X91) 

a 

finishes the proof. □ 

Proof of Theorem\^ Fix a unit vector u ^ P^- Let y = HDu and set a := "^^^ ^^^^ ^^ . 
By Proposition [SI 

P(||^^|loo<«)<l-5- (20) 



Also, recall ||1^||2 = 1 a-s. 
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Now, assume u G £2 ^ fixed unit vector satisfying \\v\\ < a. We sliall sliow tliat 



llPvIL > 1 + e ) < 25/3, and F ( \\Pv\\^ < 



Since k > 20e log(2n/(5), (|6]) is satisfied. For the first inequality in (j2T]) . Proposition [T5] 
implies 



-^||P?;||2 > l + e) <¥ {\\Pv\\l> d + 2de 



^ (- 6:2T^ J + (- 8elog(2n/5) + ^^^^^'^^ J ' 

By an elementary calculation one can check that for d and k satisfying the assumptions 
of the theorem, both exp terms do not exceed 5/3. For the second inequality in (|2ip . 
Proposition [15] gives 

1 IP,,!, < J-') < F fllP^II? <4l- -^^) < exp ' 



where the last exp term is < 5/3 for d satisfying the hypothesis of the theorem. Thus we 
proved (|2ip which implies 

^ < ^ ||Pw||2 < 1 + e ) > 1 - 5. (22) 



l + e- ^ 

Finally, since the matrix P and the vector V = HDu are independent, conditioning on V 
and combining (|20p with (|22p complete the proof. □ 

4.4.2 The £1 case {q = 1) 

Proposition 17. Assume v E 11^112 ~ ll^lloo — ^ '^^'^ ^ ~ ^ & holds, 
then for any t >0, 



¥{\\Wi\ + ... + \Wd\ - {E\Wi\ + ... + E\Wd\)\ >t)< 2exp 



t2 



2d + St/2,) ■ 



Proof. Denote Z = \Wi\ + . . . + To estimate P(|Z — EZ| > t) we use Bernstein 

inequality (Theorem [6|. To this end, we bound the moments of \Wi\. By Proposition [TOl 
and integration by parts, for any integer p > 1, 



E|Wi|P =p j tP-^¥{\W,i\ > t) dt 
Jo 



< 2p / tP-^e-^^/^ dt + 2p / tP-^ exp ( --^^1^^^ ) ds 
Jo Jo 



: p2PT{p/2) + 2p] { ^{n/k)^/'^a ^ 
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Plugging the condition we obtain 

^Wi\'P < pT {p / 2)2^ + 2p\ 



3V10 



However, for p < 6 we can provide better bounds. For p = 4, 6 we can use Lemma ITT] and 
for p = 3, 5 we use the Holder inequality E| < Y^E|Tyi|P-iE| 
Next, Lemma [12] and Proposition 1141 combined with ([6| imply 

^\Wi\ - ^Wi\\^ < E\w^\p + {E\Wi\f 

<E\W^\P+(^/2/^+^=Y <E\Wi\P + i4/3)P. 
\ 2vl0/ 



Also note that 



EIWA 



mwi\ 



< 1 



IT 



^ 9_ 
2VWJ ~ 10' 



All these yield the bound 



E||Wi| -E|Wi||^ < ^afMP-'^ 



for any integer p >2 with = 1 and M = 4/3, as illustrated by the following table: 



p 


E \Wi\ - E\Wi\ 


p 




2 


< 0.9 


1 


3 


< 3.83 


4 


4 


< 5.73 


21 


5 


< 10.6 


^ 142 


6 


< 21.24 


1138 


> 7 


<pT{p/2)2P + 2pl{^y + {i/3] 




p^iA/sy-' 



Now, Bernstein's inequality (Theorem [6| completes the proof. 
Proof of Theorem As in the proof of Theorem |2l it is enough to show 

1 



d 



\Pv\\,-VV 



IT 



> ey/2/^^ < 5 



□ 



(23) 



for any unit v G £2 satisfying \\v\\^ < a := \/2e log{2n/6) / \/n. 

Fix K G (0,1). The condition ([6| is satisfied since k > 20e log(2n/(5). Proposition [T71 
used for t = dKey^/vr implies 



^1 llPvlli -E||Pu||^ I > Ke^/2/^^ < 2exp ^- 



K^2/TT)d£^ 
2 + y^^KS 



< S, 
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where the last inequahty follows from the assumption on d, while Proposition [T3] yields 



IT 



< — ^-^= — < (1 - .).y2A, 



where the last inequality follows from the assumption on k. 



□ 
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